Goto

Collaborating Authors

 pascal dataset


Improving Multi-Label Contrastive Learning by Leveraging Label Distribution

arXiv.org Artificial Intelligence

In multi-label learning, leveraging contrastive learning to learn better representations faces a key challenge: selecting positive and negative samples and effectively utilizing label information. Previous studies selected positive and negative samples based on the overlap between labels and used them for label-wise loss balancing. However, these methods suffer from a complex selection process and fail to account for the varying importance of different labels. To address these problems, we propose a novel method that improves multi-label contrastive learning through label distribution. Specifically, when selecting positive and negative samples, we only need to consider whether there is an intersection between labels. To model the relationships between labels, we introduce two methods to recover label distributions from logical labels, based on Radial Basis Function (RBF) and contrastive loss, respectively. We evaluate our method on nine widely used multi-label datasets, including image and vector datasets. The results demonstrate that our method outperforms state-of-the-art methods in six evaluation metrics.


Heart Sound Segmentation Using Deep Learning Techniques

arXiv.org Artificial Intelligence

Heart disease remains a leading cause of mortality worldwide. Auscultation, the process of listening to heart sounds, can be enhanced through computer-aided analysis using Phonocardiogram (PCG) signals. This paper presents a novel approach for heart sound segmentation and classification into S1 (LUB) and S2 (DUB) sounds. We employ FFT-based filtering, dynamic programming for event detection, and a Siamese network for robust classification. Our method demonstrates superior performance on the PASCAL heart sound dataset compared to existing approaches.


Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation

Neural Information Processing Systems

Many methods have been proposed to solve the problems of recovering intrinsic scene properties such as shape, reflectance and illumination from a single image, and object class segmentation separately. While these two problems are mutually informative, in the past not many papers have addressed this topic. In this work we explore such joint estimation of intrinsic scene properties recovered from an image, together with the estimation of the objects and attributes present in the scene. In this way, our unified framework is able to capture the correlations between intrinsic properties (reflectance, shape, illumination), objects (table, tv-monitor), and materials (wooden, plastic) in a given scene. For example, our model is able to enforce the condition that if a set of pixels take same object label, e.g.


Deep Learning Based Classification of Unsegmented Phonocardiogram Spectrograms Leveraging Transfer Learning

arXiv.org Artificial Intelligence

Cardiovascular diseases (CVDs) are the main cause of deaths all over the world. Heart murmurs are the most common abnormalities detected during the auscultation process. The two widely used publicly available phonocardiogram (PCG) datasets are from the PhysioNet/CinC (2016) and PASCAL (2011) challenges. The datasets are significantly different in terms of the tools used for data acquisition, clinical protocols, digital storages and signal qualities, making it challenging to process and analyze. In this work, we have used short-time Fourier transform (STFT) based spectrograms to learn the representative patterns of the normal and abnormal PCG signals. Spectrograms generated from both the datasets are utilized to perform three different studies: (i) train, validate and test different variants of convolutional neural network (CNN) models with PhysioNet dataset, (ii) train, validate and test the best performing CNN structure on combined PhysioNet-PASCAL dataset and (iii) finally, transfer learning technique is employed to train the best performing pre-trained network from the first study with PASCAL dataset. We propose a novel, less complex and relatively light custom CNN model for the classification of PhysioNet, combined and PASCAL datasets. The first study achieves an accuracy, sensitivity, specificity, precision and F1 score of 95.4%, 96.3%, 92.4%, 97.6% and 96.98% respectively while the second study shows accuracy, sensitivity, specificity, precision and F1 score of 94.2%, 95.5%, 90.3%, 96.8% and 96.1% respectively. Finally, the third study shows a precision of 98.29% on the noisy PASCAL dataset with transfer learning approach. All the three proposed approaches outperform most of the recent competing studies by achieving comparatively high classification accuracy and precision, which make them suitable for screening CVDs using PCG signals.


Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation

Neural Information Processing Systems

Many methods have been proposed to recover the intrinsic scene properties such as shape, reflectance and illumination from a single image. However, most of these models have been applied on laboratory datasets. In this work we explore the synergy effects between intrinsic scene properties recovered from an image, and the objects and attributes present in the scene. We cast the problem in a joint energy minimization framework; thus our model is able to encode the strong correlations between intrinsic properties (reflectance, shape, illumination), objects (table, tv-monitor), and materials (wooden, plastic) in a given scene. We tested our approach on the NYU and Pascal datasets, and observe both qualitative and quantitative improvements in the overall accuracy.


Choosing Linguistics over Vision to Describe Images

AAAI Conferences

In this paper, we address the problem of automatically generating human-like descriptions for unseen images, given a collection of images and their corresponding human-generated descriptions. Previous attempts for this task mostly rely on visual clues and corpus statistics, but do not take much advantage of the semantic information inherent in the available image descriptions. Here, we present a generic method which benefits from all these three sources (i.e. visual clues, corpus statistics and available descriptions) simultaneously, and is capable of constructing novel descriptions. Our approach works on syntactically and linguistically motivated phrases extracted from the human descriptions. Experimental evaluations demonstrate that our formulation mostly generates lucid and semantically correct descriptions, and significantly outperforms the previous methods on automatic evaluation metrics. One of the significant advantages of our approach is that we can generate multiple interesting descriptions for an image. Unlike any previous work, we also test the applicability of our method on a large dataset containing complex images with rich descriptions.